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Abstract 

In a minimal binary constraint network, every tuple of a constraint relation can be ex- 
tended to a solution. The tractability or intractability of computing a solution to such 
a minimal network was a long standing open question. Dechter conjectured this com- 
putation problem to be NP-hard. We prove this conjecture. We also prove a conjecture 
by Dechter and Pearl stating that for fc > 2 it is NP-hard to decide whether a single 
constraint can be decomposed into an equivalent fc-ary constraint network. We show 
that this holds even in case of bi-valued constraints where k > 3, which proves another 
conjecture of Dechter and Pearl. Finally, we establish the tractability frontier for this 
problem with respect to the domain cardinality and the parameter k. 

Keywords: Constraint satisfaction, constraint network, CSP, minimal network, 
complexity, join decomposition, structure identification, relational data, relational 
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1. Introduction 

This paper deals with problems related to minimal constraint networks. First, the 
complexity of computing a solution to a minimal network is determined. Then, the 
problems of recognizing network minimality and network-decomposability are studied. 

1.1. Minimal constraint networks 

In his seminal 1974 paper ll26l . Montanari introduced the concept of a minimal 
constraint network. Roughly, a minimal network is a constraint network where each 
partial instantiation corresponding to a tuple of a constraint relation can be extended 
to a solution. Each arbitrary binary network N having variables {X\, . . . , X v } can 
be transformed into an equivalent binary minimal network M{N) by computing the 
set sol(N) of all solutions to N and creating for l<i<j<va constraint cy 
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whose scope is (Xi,Xj) and whose constraint relation consists of the projection of 
sol(N) to (Xi,Xj), and for 1 < i < v a unary constraint Cj whose scope is (Xi) 
and whose constraint relation is the projection of sol(N) over (Xi). The minimal 
network M(N) is unique, and its solutions are exactly those of the original network, 
i.e., sol{N) = sol(M(N)). 

An example of a binary constraint network N is given in Figure This network 
has four variables X±,..., X4 which, for simplicity, all range over the same numerical 
domain {1, 2, 3, 4, 5}. Its solution, sol(N), which is the join of all relations of N, is 
shown in Figure[lJ). The minimal network M(N) is shown in Figure[T|;. 



X\ X 2 X 2 X 3 Xi X 3 X 3 A4 

11 11 12 12 

12 12 2 1 2 1 

13 22 31 31 
15 31 32 32 
2 1 4 1 4 1 4 1 
25 43 42 42 
34 44 43 43 

(a) Binary constraint network N 



X\ Xi A3 X4 
112 1 
12 2 1 

2 112 

3 4 12 
(b) Solution sol(N) ofN 
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(c) Minimal network M(N) 

Figure 1 



2 1 



2 



Obviously, M(N), which can be regarded as an optimally pruned version of N, is 
hard to compute. But computing M(N) may result in a quite useful knowledge com- 
pilation ETI l5ll. In fact, with M(N) at hand, we can answer a number of queries in 
polynomial time that would otherwise be NP-hard. Typically, these are queries that 
involve one or two variables only, for example, the queries "is there a solution for 
which X4 < 3?" or "does N have a solution for which X2 < X\?" are affirmatively 
answered by a simple lookup in the relevant tables of M(N). For the latter query, 
for example, one just has to look into the first relation table of M(N), whose tuple 
(2, 1) constitutes a witness. In contrast, in our example, the query "is there a solu- 
tion for which Xi < X4?" is immediately recognized to have a negative answer, as 
the fourth relation of M(N) has no tuple witnessing this inequality. An example of a 
slightly more involved non-Boolean two-variable query that can be polynomially an- 
swered using M(N) is: "what is the maximal value of X2 such that X4 is minimum over 
all solutions?" . Again, one can just "read off' the answer from the single relation of 
M(N) whose variables are those of the query. In our example in Figure [T] it is the 
penultimate relation of M(N), that can be easily used to deduce that the answer is 2 . 

1.2. Computing solutions to minimal constraint networks 

In applications such as computer-supported interactive product configuration, such 
queries arise frequently, but it would be useful to be able to exhibit at the same time 
a full solution together with the query answer, that is, an assignment of values to all 
variables witnessing this answer. However, it was even unclear whether the following 
problem is tractable: Given a nonempty minimal network M(N), compute an arbitrary 
solution to it. Gaur ifTTI formulated this as an open problem. He showed that a stronger 
version of the problem, where solutions restricted by specific value assignments to 
a pair of variables are sought, is NP-hard, but speculated that finding arbitrary solu- 
tions could be tractable. However, since the introduction of minimal networks in 1974, 
no one came up with a polynomial-time algorithm for this task. This led Dechter to 
conjecture that this problem is hard [8|. Note that this problem deviates in two ways 
from classical decision problems: First, it is a search problem rather than a decision 
problem, and second, it is a promise problem, where it is "promised" that the input net- 
works, which constitute our problem instances, are indeed minimal — a promise whose 



verification is itself NP-hard (see Section 4.1 1. We therefore have to clarify what NP 



hardness means, when referring to such problems. The simplest and probably cleanest 
definition is the following: The problem is NP-hard if any polynomial algorithms solv- 
ing it would imply the existence of a polynomial-time algorithm for NP-hard decision 
problems, and would thus imply P=NP. In the light of this, Dechter's conjecture reads 
as follows: 

Conjecture 1.1 (DechterO). Unless P=NP, computing a single solution to a non- 
empty minimal constraint network cannot be done in polynomial time. 

While the problem has interested a number of researchers, it has not been solved 
until recently. Some progress was made by Bessiere in 2006. In his well-known hand- 
book article "Constraint Propagation" |4|, he used results of Cros [6| to show that no 
backtrack-free algorithm for computing a solution from a minimal network can exist 
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unless the Polynomial Hierarchy collapses to its second level (more precisely, unless 
^2 = nf)- However, this does not mean that the problem is intractable. A backtrack- 
free algorithm according to Bessiere must be able to recognize each partial assignment 
that is extensible to a solution. In a sense, such an algorithm, even if it computes only 
one solution, must have the potential to compute all solutions just by changing the 
choices of the variable-instantiations made at the different steps. In more colloquial 
terms, backtrack-free algorithms according to Bessiere must be fair to all solutions. 
Bessiere's result does not preclude the existence of a less general algorithm that com- 
putes just one solution, while being unable to recognize all partial assignments, and 
thus being unfair to some solutions. 

The simple example in Figure [T] by the way, shows that the following naive back- 
track-free strategy is doomed to fail: Pick an arbitrary tuple from the first relation of 
M(N), expand it by a suitable tuple of the second relation, and so on. In fact, if we 
just picked the first tuple (1, 1) of the first relation, we could combine it with the first 
tuple (1,1) of the second relation and obtain the partial instantiation Xi = X 2 — 
X3 — 1. However, this partial instantiation is not part of a solution, as it cannot be 
expanded to match any tuple of the third relation. While this naive strategy fails, one 
may still imagine the existence of a more sophisticated backtrack-free strategy, that pre- 
computes in polynomial time some helpful data structure before embarking on choices. 
However, as we show in this paper, such a strategy cannot exist unless NP=P. 

In the first part of this paper, we prove Dechter's conjecture by showing that ev- 
ery polynomial-time search algorithm A that computes a single solution to a minimal 
network can be transformed into a polynomial-time decision algorithm A* for the clas- 
sical satisfiability problem 3SAT. The proof is carried-out in Section[3] We first show 
that each SAT instance can be transformed in polynomial time into an equivalent one 
that is highly symmetric (Section 3.1 1. Such symmetric instances, which we call k- 
supersymmetric, are then polynomially reduced to the problem of computing a solution 
to a minimal binary constraint network (Section 3.2 1. We further consider the case of 
bounded domains, that is, when the input instances are such that the cardinality of the 
overall domain of all values that may appear in the constraint relation is bounded by 
some fixed constant c. By a simple modification of the proof of the general case, it is 
easily seen that even in the bounded domain case, the problem of computing a single 
solution remains NP-hard (Section [3~3"] >. 

Our hardness results for computing relations can be reformulated in terms of data- 
base theory. Every constraint network N can be seen as a relational database instance, 
where each constraint of N correponds to a single relation instance. The set sol(N) 
of all solutions to a binary constraint network (or database instance) N is identical to 
the relation obtained by performing the natural join of all relation instances of N. The 
minimal network M(N) is then a lossless decomposition of sol(N) according to the 
join dependency *[S], where S is the schema of M(N). Our main hardness result thus 
implies that it is coNP hard to recover an arbitrary single tuple of a relation instance R 
(called a universal relation) from its lossles decomposition according to a given single 
join dependency, when only this decomposition is given. Lossles decompositions and 
universal relations have been studied for many decades, and they were recently related 
to hidden variable models in quantum mechanics QJEJ. 
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1.3. Minimality checking and structure identification 



In Section 4. 1 we generalize and slightly strengthen a result by Gaur [ 1 1 1 by show- 
ing that it is NP-hard to determine whether a fc-ary network is minimal, even in case of 
bounded domains. 

In Section |4~2] we study the complexity of checking whether a network N con- 
sisting of a single constraint relation (typically of arity > fc) can be represented by an 
equivalent fc-ary constraint network. Note that this is precisely the case iff there exists 
a fc-ary minimal network M equivalent to N, i.e, one such that sol(M) — sol(N). 
Dechter and Pearl |9] regarded this problem as a relevant complexity problem of struc- 
ture identification for relational data, i.e., of checking whether an element of a general 
class of objects (in this case, data relations) belongs to a structurally simpler subclass 
(in this case, fc-decomposable relations). This problem is equivalent to the database 
problem of testing whether a given instance of a data relation satisfies a specific join 
dependency. Dechter and Pearl conjectured that the problem is NP-hard for k > 2. We 
prove this conjecture by showing the problem to be coNP-complete for each k > 2. 

A special case considered in [9| is the one of bi-valued constraints, that is, con- 
straints over the Boolean domain. For bi-valued constraints, the above structure iden- 
tification problem is equivalent to testing whether a Boolean formula represented by 
the explicit list of all its models is equivalent to a A; -CNF. For k = 2 this problem is 
known to be tractable (see 1171 \TT[ ). Dechter and Pearl [9| conjectured it to be NP- 



hard for every fixed k > 2. In Section 4.3 we prove this conjecture and show that 



deciding whether bi-valued relations are fc-decomposable is coNP-complete for each 



fixed fc > 2. Moreover, we show in Section 4.4 that the representability of tri-valued 
constraints (and more generally r-valued constraints for r > 3) as a fc-ary network is 
coNP-hard for each fixed fc > 2. Put together, our results allow us to trace the precise 
tractability frontier for the problem of relational structure identification in terms of the 
domain cardinality and the parameter k. This is visualized in Fig[5]in Section 4.4 



The paper is concluded in Section [5] by a brief discussion of the practical signif- 
icance of our main result, a proposal for the enhancement of minimal networks, and 
some hints at possible future research. 



2. Preliminaries and basic definitions 

While most of the definitions in this section are adapted from the standard literature 
on constraint satisfaction, in particular |8, 4|, we sometimes use a slightly different 
notation which is more convenient for our purposes. 

Constraints, networks, and solutions. We assume a totally ordered infinite set (X, -<) 
of variables. For X i: Xj £ X, X^ -< Xj means that X. t is smaller than Xj according 
to the ordering. We assume that all variables of constraint networks are from this 
set. A k-ary constraint c is a pair (scope(c),rel(c)). The scope scope{c) of c is a 
sequence (JQ 15 JQ 2 . . . , Xi k ) of fc distinct variables from X, where -< Xi 2 -< 
■■■ ~< Xi k , and where each variable X;. has an associated finite domain dom(X,.). 
The relation rel(c) of c is a subset of the Cartesian product dom(X il ) x dom(X i2 ) x 
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• • • x dom(Xi k ). The set {JQ 15 . . . ,Xi k } of all variables occurring in scope(c) is 
denoted by var(c). Given that each set of variables is totally ordered by -<, we shall 
identify each set U of variables, whenever convenient, with the list U of its elements 
ordered according to -<. We thus may write scope(c) = U instead of scope(c) = 
U. More generally, since the concepts of lists of distinct elements and ordered sets 
coincide, we may use set-theoretic notatation to express facts about such lists. For 
example, if s denotes a scope, we may write 1, e s to express that Xi is a variable in 
this scope. If, moreover, U denotes a list (or even an unordered set) of variables, we 
may write s C U to say that each variable of s is also an element of U, and so on. 

A Constraint Network N consists of a finite set var(N) = {X\ , . . . , X v } of vari- 
ables with associated domains dom(Xi) for 1 < i < v, and a set of constraints 
cons(N) = {ci, . . . , Cm}, where for 1 < i < m, var(ci) C var(N). 

If [/ C var(N) is a set of variables, then dom(U) = Uxet/ dom(X). The domain 
dom(N) of a constraint network N is defined by dom(N) — dom(var(N)). The 
schema of N is the set schema(N) = {scope(c)\c G cons(N)} of all scopes of the 
constraints of N. If S is a schema, then var(S) denotes the set of all variables in 
the scopes of S. In particular, if S is the schema of network N, we have var(S) = 
var(N). We call N binary (k-ary) if arity(c) < 2 (arity(c) < k) for each constraint 
c G cons(N). 

Let N be a constraint network. An instantiation mapping for a set of variables 
W C var(N) is a mapping : W — ► dom(VK), such that for each X e var(N), 
0(X) e dom(X). We call 6'(T / l / ) an instantiation of W. An instantiation of a proper 
subset W of var(N) is called a partial instantiation while an instantiation of var(N) 
is called a full instantiation (also fotoZ instantiation). A constraint c of iV is satisfied 
by an instantiation mapping 6 : W — 5- dom(W) if whenever var(c) C W, then 
#(scope(c)) € rel(c). An instantiation mapping : W — > dom(W) is consistent 
if it is satisfied by all constraints. By abuse of terminology, if 9 is understood and is 
consistent, then we may also say that 9(W) is consistent. A solution to a constraint 
network TV is a consistent full instantiation for N. The set of all solutions of N is 
denoted by sol(N). N is solvable iff sol(N) ^ 0. Whenever useful, we will identify 
the solution set sol(N) with a single constraint whose scope is var(N) and whose 
relation consists of all tuples in sol(N). We assume without loss of generality, that for 
each set of variables W C var(N) of a constraint network, there exists at most one 
constraint c such that var(c) — W . (In fact, if there are two or more constraints with 
exactly the same variables in the scope, an equivalent single constraint can always be 
obtained by intersecting the constraint relations.) 

Complete networks. The complete schema over a set of variables U denotes the 
schema consisting of all nonempty constraint scopes of arity at most k contained in 
U. For example, if U = {X U X 2 }, then S% = {(X^, (X 2 ), {X U X 2 )}. If the set 
of variables U is understood, we will write Sk instead of S%. A fc-ary constraint 
network is complete, if its schema is 5'™ r ( Ar ) p or eacn fixed constant k, each fc- 
ary constraint network N can be transformed by a trivial polynomial reduction into an 
equivalent complete fc-ary network with sol(N) — sol(N + ). In fact, if £ < k, 
then for each (ordered) set of variables W = {X il , . . . , X ie } that is in no scope of 
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TV, we may just add the trivial constraint Tvv with scope(T w) = (X^ , . . . , Xi e ) and 
rel(l~w) — dom(Xi x ) x dom(Xi 2 ) x • • • x dom(Xi t ). For this reason, we may, 
whenever useful, restrict our attention to complete networks. Some authors, such as 
Montanari [26 1 who studies binary networks, assume by definition that all networks 
are complete, others, such as Dechter [8| make this assumption implicitly. 

Intersections of networks, containment, and projections. Let N\ and TV2 be two con- 
straint networks defined over the same schema S (that is, the same set S of con- 
straint scopes). The intersection M = N\ n TV2 of TVj and TV2 is the network having 
var(M) = var(Ni) = var(N2), and having a constraint c s , for each seS, such that 
scope(c s ) — s and rel(c s ) = rel(cf) D rd(cf), where c\ and C2 are the constraints 
having scope s of TVi and TV2, respectively. The intersection of arbitrary families of 
constraint networks defined over the same schema is defined in a similar way. For two 
networks N\ and TV2 over the same schema S, we say that c\ is contained in C2, and 
write N\ C TV2, if for each s 6 5, and for c\ G cons(N\) and C2 G cons(N2) with 
scope(ci) = scope^) — s, rel(ci) C rel(c<i). If c is a constraint over a set of vari- 
ables = {TVi, . . . , X v } and V C W, then the projection Ily-(c) is the constraint 
whose scope is V, and whose relation is the projection over V of rel(c). Let c be a 
constraint and S a schema consisting of one or more scopes contained in scope(c), 
thenn s (c) = {n s (c)| S G5'}. 

Minimal networks. Let c be a constraint with var (c) = U. The projection II^e/ (c) 
will henceforth just denote by Hs k (c). Thus lis,, (c) is the constraint network obtained 
by projecting c over all scopes in the schema (simply denoted by Sf.), i.e., over all 
nonempty ordered lists of at most k variables from var(c). In particular, the constraints 
of IIs 2 (c) are precisely all H\v (c) such that W C war (c) is a unary or binary scope. 

It was first observed in [26| that for each binary constraint network N, there is a 
unique binary minimal network M{N) that consists of the intersection of all binary 
networks N' over schema ^2 for which sol(N') = sol(N). Minimality here is with 
respect to the above defined "G"-relation among binary networks. More generally, for 
each fc-ary network N there is a unique fc-ary minimal network Mfe(iV) that is the 
intersection of all fc-ary networks N' over schema Sk for which sol(N') — sol(N). 
(For the special case k — 2we have M%(N) = M(N).) The following is well-known 
126] [27] U 11 ED and easy to see: 

. M k (N) = U Sk (sol(N)). 

• M k (N) C JV 7 for all fc-ary networks TV' with sol(N') = sol(N). 

• A fc-ary network TV is satisfiable (i.e., has at least one solution) iff M k (N) is 
non-empty. 

• A fc-ary network N is minimal iff TLg h (sol(N)) = N. 

• A fc-ary network N is minimal iff M k {N) = N . 

• A network N over schema Sk is minimal iff there exists a universal relation p 
for N, that is, a single constraint p such that N = Iis k (p)- In this case TV is said 
to be join consistent (see [ 19 1). 



7 



It is obvious that for k > 2, Mk(N) is hard to compute. In fact, just deciding 
whether for a network N, Mk(N) is the empty network is coNP-complete, because 
this decision problem is equivalent to deciding whether N has no solution. (Recall 
that deciding whether a network N has a solution is NP-complete ||23l .) In this paper, 
however, we are not primarily interested in computing Mf.(N), but in computing a 
single solution, in case Mk(N) has already been computed and is known. 

Graph theoretic characterization of minimal networks. An n-partite graph is a graph 
whose vertices can be partitioned into n disjoint sets so that no two vertices from the 
same set are adjacent. It is well-known (see, e.g., [31 1) that each binary constraint net- 
work TV on n variables can be represented as an n-partite graph Gn- The vertices of 
Gm are possible instantiations of the variables by their corresponding domain values. 
Thus, for each variable X, and possible domain value a E dom(Xi), there is a vertex 
Xf. Two vertices Xf and X!j are connected by an edge in Gn iff the relation of the 
constraint cfj with scope (Xi, Yf) contains the tuple (a, 6)F]Gaur [11] gave the follow- 
ing nice characterization of minimal networks: A solvablajcomplete binary constraint 
network N onn variables is minimal iff each edge of N is part of a clique of size n of 
Gn- Note that by definition of Gat as an n-partite graph, there cannot be any clique 
in Gat with more than n vertices, and thus the cliques of n vertices are precisely the 
maximum cliques of Gn- 

Satisfiability problems. An instance G of the Satisfiability (SAT) problem is a con- 
junction of clauses (often just written as a set of clauses), each of which consists of 
a disjunction (often written as set) of literals, i.e., of positive or negated propositional 
variables. Propositional variables are also called (propositional) atoms. If a is a set of 
clauses or a single clause, then we denote by propvar(a) the set of all propositional 
variables occurring in a. 

A 3SAT instance is a SAT instance each clause of which is a disjunction of at most 
three literals. 3SAT is the problem of deciding whether a 3SAT instance is satisfiable. 

3. NP-hardness of computing minimal network solutions 

To show that computing a single solution to a minimal network is NP-hard, we will 
do exactly the contrary of what people — or automatic constraint solvers — usually do 
whilst solving a constraint network or a SAT instance. While everybody aims at break- 
ing symmetries, we will actually introduce additional symmetry into a 3SAT instance 
and its corresponding constraint network representation. This will be achieved by the 
Symmetry Lemma to be proved in the next section. 



We disregard unary relations of N here; in fact, each unary relation of a constraint network can be 
eliminated by appropriately restricting the domain of its scope variable. 

2 We here refer to solvability according to our definition; Gaur uses a different definition of this term. 
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3.1. The Symmetry Lemma 

The following lemma shows that, for each fixed k > 1, one can transform an 
arbitrary 3SAT instance C in polynomial time into a satisfiability-equivalent highly 
symmetric SAT instance C* such that, whenever C (and thus C*) is satisfiable, each 
truth value assignment to any k variables of C* can be extended to a truth value as- 
signment satisfying C* . Before stating the lemma, let us formally define this notion of 
symmetry, which we refer to as supersymmetry . 

Definition 3.1. For k > 1, a SAT instance C is k-supersymmetric if C is either un- 
satisfiable or if for each set of k propositional variables {pi, . . . C propvar(C) 
and for each arbitrary truth value assignment r\ to {pi, . . . ,Pk}, there exists a satisfying 
truth value assignment r for C that extends r\. A SAT instance that is 2-supersymmetric 
is also called supersymmetric. 

Assume k < k'. By the above definition, if a SAT instance C is fc'-supersymmetric, 
then C is also fc-supersymmetric. However, a fc-supersymmetric SAT instance C is not 
necessarily also fc'-supersymmetric. 

Lemma 3.1 (Symmetry Lemma). For each fixed integer k > 1, there is a polynomial- 
time transformation T that transforms each 3SAT instance C into a k-supersymmetric 
SAT instance C* such that C is satisfiable iff C* is satisfiable. 



We illustrate the proof of Lemma [3T| by an example. A full proof is given in Ap 



pendix A 



Proof. (Illustration by Example) Consider the 3SAT instance C = C\ AC% AC3, where 

Ci = p V ->g V r 
C2 = ->p V ->q 
C 3 = q 

Clearly, the above 3SAT instance C, while satisfiable, is not even 1-supersymmetric, 
and therefore, a fortiori, not fc-supersymmetric for any k > 1. To see this, observe that 
the partial truth-value assignment assigning false to q always falsifies clause C3, and 
can thus not be extended to a satisfying truth value assignment for C. In the sequel, 
we illustrate how C can be transformed by a polynomial-time transformation T into 
a satisfiable supersymmetric SAT instance C* = T(C). To this aim we introduce to 
each propositional variable uofCa set New(v) of five new propositional variables. In 
particular, we have: 

New(p) = {p 1 ,p 2 ,p 3 ,P4,P5}, 
New(q) = {ft, g 2 , 93,94,95}, and 
New(r) = {r 1 ,r 2 ,r 3 ,r 4 ,r 5 }. 

We now create C* from C by taking the conjunction of all clauses obtained by 
replacing in each clause of C each positive literal v in all possible ways by the disjunc- 
tion Vi V Vj V Vk of three elements w i; Vj,Vk € New(v), and by replacing each negative 
literal in all possible ways by the disjunction ->Vi V ^Vj V -*Vf., where w i; Vj,Vk are 
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elements of New(v). Each clause is thus replaced by a multitude of other clauses that 
are all taken in conjunction. In particular, in our example, clause C\ will actually be 
replaced by the conjunction of the following 1000 clauses C\ . . . C\ 000 : 
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Similarly, clause C 2 = ->p V is replaced by the following 100 clauses C\ . . . C 2 00 : 

C\ : ->pi V -p 2 V ->p 3 V ->qi V -g 2 V ~^q 3 : 
C\ : ^pi V ->p 2 V V V ->g 2 V -q 4 ; 

C 2 100 : V V -np 5 V V -q 4 V -,g 5 . 

Finally, clause C3 = p is replaced by the following 10 clauses C\ , . . . , C\° : 

C3 1 : qi V q 2 V q 3 ; 
Cf : gi V g 2 V q 4 : 

C 3 10 : gsV^Vgs. 

The SAT instance C* = T(C) then consists of the conjunction of all these clauses: 

C* = Cl A . . . A Cl 000 A Cl A ... A C 2 100 A C\ A ... A C* 3 10 . 

We claim — and formally prove in |Appendix A| — that the above transformation 
from a 3SAT instance C to a SAT instance C* satisfies the following two key facts: 

Fact 1 C* is satisfiable iff C is satisfiable (in our example, C* is thus satisfiable). In 
fact, each satisfiable truth value assignment r to the propositional variables of C 
can be transformed to a satisfying truth value assignment r* to C* as follows: 
If t(v) = true, then let r* assign true to at least three propositional variables 
in New(v), and false to all others, and if t(v) = false, then let r* assign false 
to at least three propositional variables in New(v), and true to all others. In our 
example, for instance, consider the truth value assignment t satisfying C, where 
r(p) = false and r(q) = r(r) = true. This truth value assignment satisfies r 
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and therefore C\. The assignment r* to C* thus assigns true to at least three 
atoms from New(r) = {r\,r<z : rs, 7-4, r$}, assume, for example to {r\, r±, r$}. 
But each 3-element subset of New(r) has a non-empty intersection with each 
other non-empty three element subset of New(r), and thus with the set of atoms 
of each and every clause C\ G {C\, . . . , Cj 1000 }. Therefore, each such clause 
G\ is satisfied. For example, C\ has a T\ in common with the set {ri,r4, r$}, 
and so must be satisfied by r* for r* (7*1) = true. A similar argument holds for 
negative literals. Applying the same type of reasoning to all clauses C, of C, 
given that each such Cj has at least one literal satisfied by r, all clauses C\ of 
C* are satisfied by r*. In summary, r* satisfies C*. Vice versa, we show in the 
full proof that if C* is satisfiable, then so must be C. 

Fact 2 C* is supersymmetric. Intuitively, this is due to the great choice of truth value 
assignments to the propositional variables in New(v), when constructing a sat- 
isfying assignment r* for C* , as above, from an assignment r for C. Imag- 
ine, for illustration, we'd like to construct a truth value assignment r* satisfy- 
ing our example-instance C* , such that r*(pi) = true and r*(q 3 ) = false. 
Note that no truth value assignment to instance C can actually satisfy p or fal- 
sify q. Notwithstanding, we are able to find an appropriate t* with the desired 
properties. We start with an arbitrary satisfying truth value assignment r to C, 
for example, the one where r(p) — false and r(q) = r(r) = true. To con- 
struct t* , let us first define r* on the elements of New(p) — {pi, . . . ,p$}. 
According to the construction rules for r* in the previous paragraph, given 
that r(p) = false, t* must assign false to at least three elements of New(p), 
but not necessarily to all elements of New(p). This leaves us the freedom 
of assigning true to p%. So, we can, for example, assign false to p2,P3,P4, 
and P5, and true to p\. Similarly, given that r(q) — true, t* must assign 
true to at least three elements of New(q), which can be done while fulfill- 
ing at the same time our requirement that r*(gs) = false. For example, let 
r *(<7i) = T *(Q2) = T *(<Z5) = t rue an d t* (q 3 ) = r* (g 4 ) = false. Finally, the 
only requirement regarding the truth values assigned by r* to the elements of 
New(r) is that at least three of these propositional variables be assigned true. 
Thus, for example, let r*(ri) — r*(r 2 ) = . . . = T*(r 5 ) = true. In summary, 
it is easy to see (and actually follows from Fact 1) that the truth value assign- 
ment t* constructed this way satisfies C* . Moreover, r* extends the initially 
given partial truth value assignment r*(pi) = true and T*(qs) = false. More 
generally, for every pair v,w of propositional variables of C* , and for every 
truth value assignment 77 to {v, w}, one can construct a truth value assignment 
t* that extends 77 and satisfies C* This shows that C* is 2-supersymmetric, i.e., 
supersymmetric. 

As easily seen, the transformation from an arbitrary 3SAT instance C to the corre- 
sponding C* is polynomial-time computable. Together with Facts 1 and 2, this infor- 
mally proves the Symmetry Lemma for k — 2. For k > 2, the proof is analogous. □ 

REMARK. The concept of supersymmetry is somewhat related to the notions of quad- 
rangle and subquadrangle defined in |30| and further discussed in ll20l . A quadrangle 
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is a single constraint c that is satisfied for all value-assignments that assign any arbi- 
trary value from dom(X) to each variable X in scope(c). Thus, the constraint relation 
rel(c) of a quadrangle c simply consists of a Cartesian product of domains. An n-ary 
constraint c is a subquadrangle if each projection of c to n — 1 or fewer variables from 
scope(c) is a quadrangle. Generalizing this notion, we define a k-subquadrangle to be 
a constraint, all of whose projections to k variables are quadrangles. In this context, 
Lemma 3.1 may be reformulated as follows: For each k > 1, every satisfiable 3SAT 
instance C can be transformed to a satisfiable SAT instance C* whose solution relation 
sol(C*) is a fc-subquadrangle. 



3.2. Intractability of computing solutions 

The Symmetry Lemma is used for proving our main intractability result. 

Theorem 3.1. For each fixed constant k > 2, unless NP=P, computing a single solu- 
tion from a minimal k-ary constraint network N cannot be done in polynomial time. 
The problem remains intractable even if the cardinality of each variable-domain is 
bounded by a fixed constant. 

Proof. We first prove the theorem for k = 2. Assume A is an algorithm that computes 
in time p(n), where p is some polynomial, a solution A(N) to each non-empty minimal 
binary constraint network TV of size n. We will construct a polynomial-time 3SAT- 
solver A* from A. The theorem then follows. 

Let us first define a simple transformation S from SAT instances to equivalent bi- 
nary constraint networks. S transforms conjunctions K = K\ A • • • A K r of at least 
two clauses into binary constraint networks S(K) — Nk as follows. The set of vari- 
ables var(Nx) is defined by var(Nx) — {K\, . . . , K r }. For each variable Ki of Njc, 
the domain dom(Ki) consists exactly of all literals appearing in Ki. For each distinct 
pair of clauses (Ki, Kf), i < j, there is a constraint having scope(cij) — (Ki, Kf) 
and rel(cij) = (dom(Ki) x dom(Kj)) — {(p,p), (p,p) \p & propvar(K)}. More- 
over, for each variable Ki there is a unary constraint Cj whise scope contains the single 
variable K i7 such that reZ(cj) = dom(Ki). It is easy to see that K is satisfiable iff Nr 
is solvable. Basically, Nk is solvable, iff we can pick one literal per clause such that 
the set of all picked literals contains no atom together with its negation. But this is just 
equivalent to the satisfiability of K. The transformation S is clearly polynomial-time 
computable. 

Now consider constraint networks Nc — S(C*), where C* is obtained via trans- 



formation T as in Lemma 3.1 from some 3SAT instance C, i.e., C* = T(C). In a 



precise sense, Nc* inherits the high symmetry present in C* . In fact, if C* is satisfi- 



able, then, by Lemma 3.1 for every pair l\, of non-contradictory literals, there is a 
satisfying assignment that makes both literals true. Thus, if C* (and thus C) is satis- 
fiable, for every constraint cy, we may pick each pair (£i, £2) m rel(cif) as part of a 
solution, and thus no such pair is useless. Moreover, if C* is satisfiable, then, clearly, 
each value in the relations of each unary constraint Ci is part of a solution. It follows 
that if C* — and thus C — is satisfiable, then M(N C *) = N c *, which means that 
Nc* is minimal. We thus have: 
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(*) If C is satisfiable then Nc* is non-empty and minimal. 

We are now ready for specifying our 3SAT-solver A* that works in polynomial 
time, and hence witnesses NP=P. Algorithm A* is also illustrated by the flowchart in 
Figure [2] The input of A* is a 3SAT input instance C. We here assume without loss of 
generality that C has at least two clauses. A* works as follows: 

1. Apply transformation T to C and get C* := T(C). 
Note: C* is supersymmetric and C* is satisfiable iff "C is. 

2. Apply transformation S to C* and get N c > := S{C*). 
Note: Nc solvable <^> C* satisfiable C satisfiable. 

3. Run A on input Nc* for p(\Nc* |) steps; denote by w the output at this point. 
Note: If C (and thus C*) is satisfiable, then Nc* is a solvable minimal network, 
and thus w is a solution to Nc*; otherwise Nc* is unsolvable, and w is the 
empty string or any string other than a solution to Nc*. 

4. Check if w is a solution to Nc* ■ 

5. If w is not a solution to Nc* then output "C unsatisfiable" and stop. 

Note: In fact, if w is not a solution to Nc* then Nc* is either empty or non- 
minimal. By the contrapositive of Fact (*), C must then be unsatisfiable. 

6. If w is a solution to Nc* then output "C satisfiable" and stop. 

Note: Ifw is a solution, then Nc* is solvable, and thus C* and C are satisfiable. 

Each step of A* requires polynomial time only. The polynomial runtime of step [3] 
depends parametrically on the fixed polynomial p. A* is thus a polynomial-time 3SAT 
solver. The theorem for k = 2 follows. 




Figure 2: Flowchart of the 3SAT-solver A* 
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Note that C* , as constructed in the proof of Theorem 3.1 is a 9SAT instance, hence 
the cardinality of the domain of each variable of Nc* is bounded by 9. 

For k > 2, the proof is analogous, the main change being that the transformation 
S now creates an £-aiy constraint cl for each (ordered) set L of £ < k clauses from 
C. The resulting constraint network Nc* = S(C*), where C* is as constructed in 



Lemma 3.1 then does the job. □ 



3.3. The case of bounded domains 



Theorem 3.1 states that the problem of computing a solution from a non-empty 
minimal binary network is intractable even in case the cardinalities of the domains of 
all variables are bounded by a constant. However, if we take the total domain dom(N), 
which is the set of all literals of C* , its cardinality is unbounded. This notwithstanding, 



the following simple corolloary to Theorem 3.1 shows that even in case \dom(N)\ is 



bounded, computing a single solution from a minimal network TV is hard. 

Corollary 3.1. For each fixed k > 2, unless NP=P, computing a single solution from a 
minimal k-ary constraint network N cannot be done in polynomial time, even in case 
\dom{N) | is bounded by a constant. 

Proof. We prove the result for k — 2; for higher values of k, the proof is totally 
analogous. The key fact we exploit here is that each variable K a of Nc* in the proof of 



Theorem 3.1 has a domain of exactly nine elements, corresponding to the nine literals 
occurring in clause K a of C*. We "standardize" these domains by simply renaming 
the nine literals for each variable by the numbers 1 to 9. We thus get an equivalent 
minimal constraint network with a total domain of cardinality 9. □ 



4. Minimal Network Recognition and Structure Identification 



In this section we first deal with the complexity of recognizing whether a fc-ary 

We then study 



network M is the minimal network of a fc-ary network N (Section 4. 1 



the problem of deciding whether a fc-ary network M is the minimal network of a single 
constraint (Section|4~2|). 



4.1. Minimal network recognition 



An algorithmic problem of obvious relevance is recognizing whether a given net- 
work is minimal. Using the graph-theoretic characterization of minimal networks de- 
scribed in Section[2] Gaur [ 1 1 1 has shown the following for binary networks: 

Proposition 4.1 (Gaur 111 II ). Deciding whether a complete binary network N is mini- 
mal is NP-complete under Turing reductions. 

We generalize Gaur's result to the fc-ary case and slightly strengthen it by showing 
NP-completeness under the standard notion of polynomial-time many-one reductions: 
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Theorem 4.1. For each k > 2, deciding whether a complete k-ary network N is 
minimal is NF '-complete, even in case of bounded domain sizes. 

Proof. Membership in NP is easily seen: We just need to guess a candidate solution 
s t from sol(N) for each of the polynomially many tuples t of each constraint c of N, 
and check in polynomial time that s t is effectively a solution and that the projection 



of s t over scope(c) yields t. For proving hardness, revisit the proof of Theorem 3.1 
For each k > 2, from a 3SAT instance C, we there construct in polynomial time a 
highly symmetric fc-ary network with bounded domain sizes Nc*, such that Nc* is 
minimal (i.e., Mk(Nc) — Nc iff C is satisfiable). This is clearly a standard many- 
one reduction from 3SAT to network minimality. □ 



A result in database theory similar to Theorem 4. 1 was shown in [ 19], where it was 



proven that determining whether a set of database relations is join consistent (i.e., ad- 
mits a universal relation) is NP-complete. This was actually proven for sets of binary 
relations, however not over schema Sk. Here we showed that this also holds for com- 
plete k-ary networks, i.e., for sets of relations ove the specific schemas Sk, for each 
k > 2. 

4.2. Structure identification and k-representability 



This section as well as Sections 4.3 and 4.4 are dedicated to the problem of rep- 
resenting single constraints (or single-constraint networks) through equivalent fc-ary 
minimal networks. By a slight abuse of terminology, when there is no danger of confu- 
sion, we will often identify a single-constraint network {p} with its unique constraint 
p, and for tuples t of the relation rel(p) of the constraint p, we may write t E p instead 
of t € rel(p). 

Definition 4.1. A complete fc-ary network M is a minimal k-ary network of p iff 

1. sol(M) = p, and 

2. every tuple occurring in some constraint r of M is the projection of some tuple 
t of p over scope(r). 

We say that a constraint relation p is k-representable if there exists a (not neces- 
sarily complete) fc-ary constraint network M such that sol(M) = p. The following 



proposition seems to be well-known and follows very easily from Definition 4. 1 any- 
way. 

Proposition 4.2. Let pbe a constraint. The following three statements are equivalent: 

1 . p has a minimal k-ary network; 

2. sol(U Sk (p)) = p; 

3. pis k-representable. 

Note that the equivalence of p being fc-representable and of p admitting a minimal 
fc-ary network emphasizes the importance and usefulness of minimal networks. In a 
sense this equivalence means that the minimal fc-ary network of p, if it exists, already 
represents all fc-ary networks that are equivalent to p. 
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The complexity of deciding whether a minimal fc-ary network for a relation p ex- 
ists has been stated as an open problem by Dechter and Pearl in [9|. More precisely, 
Dechter and Pearl consider the equivalent problem of deciding whether sol(Hs k (/?)) = 
p holds, and refer to this problem as a problem of structure identification in relational 
data ||9)- The idea is to identify the class of relations p that have the structural property 
of being equivalent to the fc-ary network Hs k (p)> an d thus, of being fc-representable. 
Dechter and Pearl formulated the following conjecture: 

Conjecture 4.1 (Dechter and Pearl [9 1). For each fixed positive integer fc > 2, decid- 
ing whether sol(Hs k (p)) = p is NP-hard^ 

As already observed by Dechter and Pearl in [9|, there is a close relationship be- 
tween the fc-representability of constraint relations and a relevant database problem. 
Let us briefly digress on this. It is common knowledge that a single constraint p can 
be identified with a data relation in the context of relational databases (cf. (8)). The 
decomposition of relations plays an important role in the database area, in particular 
in the context of normalization l24l . It consists of decomposing a relation p without 
loss of information into smaller relations whose natural join yields precisely p. If 
p is a concrete data relation (i.e., a relational instance), and S is a family of subsets 
(subschemas) of the schema of p, then the decomposition of p over S consists of the 
projection II5 = {H s (p) | s G S} of p over all schemes in S. This decomposition is 
lossless iff the natural join of all H s (p) yields precisely p, or, equivalently, iff p satisfies 
the join dependency *[S]. We can thus reformulate the concept of fc-decomposability 
in terms of database theory as follows: A relation p is fc-decomposable iff it satisfies 
the join dependency * [S'fc], i.e., iff the decomposition of p into schema is lossless. 
The following complexity result was shown as early as 1981 in [25 n 



Proposition 4.3 (Maier, Sagiv, and Yannakakis 11251 ). Given a relation p and a family 
S of subsets of the schema of p, it is coNP-complete to determine whether p satisfies 
the join dependency *{S], or equivalently, whether the decomposition of p into schema 
S is lossless. 

Proposition H3] is weaker than Conjecture |4. 1 1 and does not by itself imply it, nor 



so does its proof given in [25]. In fact, Conjecture 4.1 speaks about the very specific 
sets Sk for fc > 2, which are neither mentioned in Proposition 4.3 nor used in its proof. 
Actually, the NP-hardness proof in [25] transforms 3SAT into the problem of checking 
a join dependency *[S] over schema S = (Si, . . . , 5 m +i), where one of the relation 
schemas, namely SVn+i is of unbounded arity (depending on the input 3SAT instance), 



while the others are of arity 4. To prove Conjecture 4.1 that refers to the specific 



3 Actually, the conjecture stated in [9 1 is somewhat weaker: Given a relation p and an integer k, deciding 
whether sol(Ylg k (p)) = p is NP-hard. Thus k is not fixed and is part of the input instance. However, from 
the context and use of this conjecture in 1 9 1 it is clear that Dechter and Pearl actually intend NP-hardness for 
each fixed k > 2. 

4 As mentioned by Dechter and Pearl |9 j, Jeff Ullman has proved this result, too. In fact, Ullman, on 
a request by Judea Pearl, while not aware of the specific result in 1251 . has produced a totally independent 
proof in 1991, and sent it as a private communication to Pearl. The result is also implicit in Moshe Vardi's 
1981 PhD thesis. 
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schema Sk in which all relations have arity at most k, we thus needed to develop a new 
and independent hardness argument. 

Theorem 4.2. For each fixed integer k > 2, deciding for a single constraint p whether 
sol{Ws k (p)) = A tnat i s > whether p is k-decomposable, is coNF '-complete. 

Proof. We show that deciding whether sol(Ug k (p)) ^ p is NP-complete. 

Membership. Membership in NP already follows from Proposition |4.3| but we give 
a short proof of it here for sake of self-containment. Clearly, p C sol (Hs k (p))- Thus 
sol(Ils k (p)) 7^ p iff the containment is proper, which means that there exists a tuple 
to in sol(Hs k (p)) not contained in p. One can guess such a tuple to in polynomial 
time and check in polynomial time that for each I < k, each i'-tuple of variables 
JQj , . • • , X if of var(p), the projection of t to (X ix , . . . , X ie ) is indeed a tuple of the 
corresponding constraint of Sk- Thus determining whether sol (Us k (p)) ^ p is in NP. 

Hardness. We first show hardness for the binary case, that is, the case where k = 2. 
We use the NP-hard problem 3COL of deciding whether a graph G — (V, E) with set 
of vertices V = {v\, . . . , v n } and edge set E is 3-colorable. Let G be given as input 
instance. We assume without loss of generality that G has at least three vertices. Let 
r, g, b be three data values standing for the three colors red, green, and blue, respec- 
tively. Let N 3C0L be the constraint network defined as follows. The set of variables 
var(N 3C0L ) = {Xi, . . . , X„}. The schema 5^ of N icOL consists of all exactly binary 
scopes (Xi, Xj) where Xi -< Xj, and dom(Xi) = {r,g, b} for 1 < i < n. Moreover, 
for all 1 < i < j < n, the constraint Cij with schema (Xj , Xj ) has the following 
constraint relation rel(cij) = ry: if € E, then nj is the set of pairs repre- 

senting all legal vertex colorings, i.e., r. tJ = {(r,g), (g,r), (r,b), (b,r), (g,b), (b,g)}; 
otherwise ry = {r, g,b} 2 . N }COL is thus a straightforward encoding of 3COL over 
schema S^, and obviously G is 3-colorable iff sol(N XOL ) ^ 0. Thus, deciding whether 
soZ(AW) + is NP-hard. 

We construct from iV 3COL a single constraint p with schema {Xi, . . . , X n } as fol- 
lows. The domain dom(p) contains the color constants r, g, and b, as well as special 
"tuple identifiers" to be detailed below. For each constraint cy of N 3COL , and for each 
tuple (a, 6) € r-jj, p contains a tuple < whose Xi and Xj values are a and 6, respec- 
tively, and whose X« value, for all 1 < ^ < n, £ ^ i, £ ^ j, is a constant d\,j, different 
from all values used in other tuples, whose purpose is to act as a tuple-identifier. This 
concludes the description of the transformation from a 3COL instance G = (V. E) to 
a constraint network X 3COL and further to a constraint p. Clearly, this transformation is 
feasible in polynomial time. We claim the following: 

CLAIM: sol(U S2 (p)) # p iff sol(N 3COL ) ^ (and thus iff G is 3-colorable). 

This claim clearly implies the NP-hardness of deciding sol(Hs k (p)) ^ p. Let us 
prove that the claim holds. 

We start with the ;/ direction. Assume sol(N, coh ) ^ 0. Then G = (V, E) is 3- 
colorable and hence there exists a function / : V — > {r, g, b} such that for each edge 
(Vi,Vj) E E, f(v t ) ^ f(Vj). Let t be the tuple defined by: VI < i < n,t[Xi] = /(«,-)• 
Then, by definition of t and p, for each 1 < i < j < n, t[Xi,Xj] £ Tlxi.Xj (p) and 
fort each 1 < i < n, t[Xi] € lixi(p)- Therefore, t E sol(Us 2 (p))- However, t £ p, 
because each tuple of p, unlike t, has some tuple identifiers as components. It thus 
follows that sol(Hs 2 (p)) ^ p. 
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Let us now show the only if direction of the claim. Assume sol(Hs 2 (p)) P- 
Given that, as already noted, p C sol(Hs 2 (p)), there must exist a tuple tg € sol(Hs 2 (p)) 
such that to £ p. We show that to can contain values from {r, g, b} only, and must, 
moreover, be a solution to N 3COL . Assume a tuple identifier d — occurs as a compo- 
nent of to. By construction of p, d occurs in precisely one single tuple t of p. It follows 
that each relation of n,g 2 (p) has at most one tuple containing d, and therefore the join 
of all relations of IT,5 2 (p) contains a single tuple only in which d occurs as data value, 
namely t itself. Therefore, to — t, and hence to G p, which contradicts our assump- 
tion that to ^ p. We have thus shown that to cannot contain any tuple identifier at all, 
and can be made of "color elements" from {r, g, b} only. However, by definition of p, 
each tuple tij G {r, g, b} 2 occurring in a relation with schema (Xi, Xj) of II s, (p) also 
occurs in the corresponding relation of N X0L , and vice versa. Thus sol(Us 2 (p)) ^ p 
iff sol(N 3COL ) 7^ iff G is 3-colorable, which proves our claim. 

For each fixed k > 2 we can apply exactly the same line of reasoning. We define 
-^kcoL as tne complete network on variables {X±, . . . , X n } of all fc-ary correct "color- 
ing" constraints, where the relation with schema X^ , . . . , Xi k expresses the correct 
colorings of vertices Vi x , . . . , Vi k of graph G. We then define p in a similar way as 
for k — 2: each fc-tuple of a relation of N£ OL is extended by use of (possibly multiple 
occurrences of) a tuple identifier to an n-tuple of p. Given that k is fixed, p can be 
constructed in polynomial time, and so Us k (p). It is readily seen that each tuple of 
sol(Ils k (p)) that contains a tuple identifier is already present in p because for each 
tuple-identifier value d, each relation of Hs k (p) contains at most one tuple involving 
d. Hence, any tuple in sol (Hs k (p)) ~ P involves values from {r, g, b} only, and is a 
solution to A^ OL and thus a valid 3-coloring of G. □ 



4.3. The case of bi-valued relations 

Let us now turn our attention to bi-valued relations p, that is, relations p over a 
binary domain. As explained in Section 3.2 of (5), such bi-valued relations are of 
special interest, as that they correspond to Boolaen formulas. For example, a 3CNF 
can be seen as a bi-valued constraint network of ternary relations, and a single bi- 
valued relation p corresponds to a DNF The problem of structure identification in the 
bi-valued case thus corresponds to relevant identification and learnability questions 
about Boolean formulas; we refer the reader to |9| for details. In this context, it would 
be interesting to know whether, or for which parameter k, Theorem 4.2 carries over to 



the bi-valued case. While the coNP-membership clearly applies to the special case of a 
bi-valued p, the hardness part of that proof uses a multiple-valued relation p and does 
not allow us to derive a hardness result for the bi-valued case. In fact, the relations 
p constructed in the proof of Theorem |4.2| from arbitrary 3COL instances are not bi- 
valued and actually have unbounded domains dom(p) containing \p\ + 3 element^] 
the "color constants" r, g, b, and the \p\ tuple identifiers d\ y 

As noted in (7] El, for k — 2 and bi-valued domains, the problem of deciding 
whether sol(Hs k (p)) = p is tractable. It can actually be reduced to 2SAT. But what 



5 Here |p| = |rei(p)| designates the number of tuples in the constraint relation of the constraint p. 
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about for values k > 3? Dechter and Pearl made the following conjecture (Conjecture 
3.27 in ©): 

Conjecture 4.2 (Dechter and Pearl J5)). For each fixed positive integer k > 3, decid- 
ing for a bi-valued relation p whether sol(Hs k (p)) = p is NP-hardrj 

We are able to confirm this conjecture. 

Theorem 4.3. For each fixed integer k > 3, deciding for a single bi-valued constraint 
p whether sol(Tls k (p)) = p, that is, whether p is k-decomposable, is coNF '-complete. 

The rather involved proof of this theorem is given in | Appendix B| The hardness 



part is similar in spirit to the one of Theorem 4.2 except for two important changes 
that are due to the requirement of a two-valued domain. First we encode 3SAT rather 
than 3COL, in order to achieve a binary domain. However, there is still the problem of 



the tuple identifiers (the values d\ j in the proof of Theorem 4.2 1. They are values from 
an unbounded domain. We therefore use a specific bit-vector encoding that allows us 
to represent tuple identifiers in binary format. This is, however, not totally trivial. The 
difficulty lies in the fact that in the relations of the projection Hs k (p) we do no longer 
have full bit vectors at our disposal, but only fc-bit projections of such bit vectors. 
Sophisticated coding tricks are used for coping with this problem, and for obtaining a 
correct reduction. 



Theorem 4.3 has a corollary, which we here formulate in the terminology of Dechter 
and Pearl J3]. 

Corollary 4.1. For fixed k > 3, the class of k-CNFs is not identifiable relative to all 
CNF s (unless P=NP). 

The above means the following. If a CNF <j> (or, more generally, a Boolean function 
<fi) is given by the set of all its models (i.e., by a bi-valued relation, each tuple of which 
corresponds to a model), then it is NP-hard to decide whether <fi is equivalent to a 
fc-CNF. We refer the reader to [9 1 for a more detailed account of fc-CNF identification 
and its equivalence to the problem of whether a bi-valued relation p is fc-decomposable. 
To conclude this topic, let us note that the representation of a Boolean function cj) by 
the explicit set of all its models, i.e., by all satisfying truth value assignments, is also 
known as the onset of 4> |32| . The above corollary thus states that, for fixed k > 3, it is 
NP-hard to decide whether a Boolean function specified by its onset is equivalent to a 
fc-CNFQ 



6 Note that in |9|, the parameter k is not explicitly required to be fixed, however, from the context it is 
clear that the present stronger version of the conjecture was actually intended. Moreover, Conjecture 3.27 
in | 9 | was formulated in terms of fc-CNFs rather than in a purely relational setting. To avoid additional 
definitions and terminology, we have restated an equivalent relational formulation here. In particular, we 
have replaced the term M(Tg k (p)) in the original formulation by the equivalent term sol(Ug k (p)). 

7 While we have not found this result in the literature on Boolean functions, we cannot totally exclude 
that it has been independently derived, maybe in a different context or using a different formalism. 
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4.4. Further strengthening and tractability frontier 



The technique used to prove Theorem 4.3 can be used to strengthen Theorem 4.2 



and to show that it actually also holds for tri-valued constraints p. 

Theorem 4.4. For each fixed integer k > 2, deciding for a single tri-valued constraint 
p whether sol(Tls k (p)) = p, that is, whether p is k-decomposable, is coNF '-complete. 



The proof of this theorem is given in | Appendix C We there use a transformation 



from 3COL from a graph G = (V,E) as described in the proof of Theorem |4.2| by 
applying, in addition, similar vectorization techniques as in the proof of Theorem |4. 3 1 



The above result, together with Theorem 4.3 and with the fact that the 2-representability 



of binary networks is feasible in polynomial time, (see [7]), and with the facts that the 

0- representability and 1-representability of each network and the fc-representability of 

1- valued networks are trivially tractable, gives us the following precise characterization 
of the tractability of deciding whether sol (Hs k (p)) = P- 

Theorem 4.5. For the class of i-valued relations p, deciding sol(J\s k {p)) = P is 
tractable iff i = 1 or (i = 2 and k < 2). In all other cases, the problem is coNP- 
complete. 

The following figure illustrates this tractability frontier. 
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Figure 3: Tractability Frontier for the fc-decomposability of i-valued relations p. 



5. Summary, discussion, and future research 

In this paper we have tackled and solved long standing complexity problems re- 
lated to minimal constraint networks: 

• We solved an open problem posed by Gaur 11 1 J in 1995, and later by Dechter |8 1, 
by proving Dechter's conjecture and showing that computing a solution to a min- 
imal constraint network is NP-hard. 

• We proved a conjecture on structure identification in relational data made in 
1992 by Dechter and Pearl (9|- In particular, we showed that for k > 2, it is 
coNP-complete to decide whether for a single constraint (or data relation) p, 
sol(Hs k (p)) = p, and thus whether p is /c-decomposable. 
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• We also proved a refined conjecture of Dechter and Pearl |9|, showing that the 
above problem remains coNP-hard even if p is a bi-valued constraint, in case 
k > 3. A consequence of this is the NP-hardness of identifying fc-CNFs relative 
to the class of all CNFs (when represented by the explicit enumeration of their 
models). 

• We finally proved that deciding whether sol(Hs k (p)) = p is coNP-complete 
for tri- valued relations and k > 2. Together with our other results on structure 
identification, this allowed us to trace the precise tractability frontier for this 
problem. 

We wish to make clear that our hardness result about computing solutions to min- 
imal networks does not mean that we think minimal networks are useless. To the 
contrary, we are convinced that network minimality is a most desirable property when 
a solution space needs to be efficiently represented for applications such as computer- 
supported configuration iflOl . For example, a user interactively configuring a PC con- 
strains a relatively small number of variables, say, by specifying a maximum price, a 
minimum CPU clock rate, and the desired hard disk type and capacity. The user then 
wants to quickly know whether a solution exists, and if so, wants to see it. For a fc-ary 
minimal constraint network, the satisfiability of queries involving k variables only can 



be decided in polynomial time. However, our Theorem 3.1 states that, unless NP=P, in 
case the query is satisfiable, there is no way to witness the satisfiability by a complete 
solution (in our example, by exhibiting a completely configured PC satisfying the user 
requests). 

Our Theorem |3.1| thus unveils a certain deficiency of minimal networks, namely, 
the failure of being able to exhibit full solutions. However, we have a strikingly simple 
proposal for redressing this deficiency. Rather than just storing ^-tuples (where £ < k) 
in a fc-ary minimal network M/-(iV), we may store a full solution t + with each £-tuple, 
where t + coincides with t on the £ variables of t. Call this extended minimal network 
MjT(N). Complexity-wise, M~£(N) is not harder to obtain than Mf.(N). Moreover, 
in practical terms, given that the known algorithms for computing Mk(N) from N 
require to check for each £- tuple t whether it occurs in some solution t + , why not just 
memorize t + on the fly for each "good" tuple tl Note also that the size of (N) 
is still polynomial, and at most by a factor \var(N)\ larger than the size of Mk(N). 
One may even go further and store not just a single solution, but the K best solutions 
(according to some predefined preference ordering) whose values coincide with those 
of t with each tuple t of Mj.(iV). This allows one to answer top K queries with at most 
k variables in polynomial time, once Mj. (TV) has been compiled. An example would 
be: show me the 5 cheapest laptops fulfilling <fi, where <f> constrains k variables only. 

For practical applications it is not always optimal to consider the complete schemas 
Sk as defined here. In the conference version 1 17 1 of this paper, Sk was defined to con- 
tain only all exactly fc-ary relations over a given set of variables, rather than all at most 
fc-ary relations. It is easy to see that the relations with scopes of fewer than k variables 
are redundant and can indeed be omitted (they can always be obtained via projections 
from the exactly fc-ary relations). The only reason why we used the complete schemas 
in the present journal version is that we wanted to use exactly the same definition as 



in the standard references [26, 8|. However, yet more liberal definitions are possible. 
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For example, Lecoutre ll22l defines a constraint network N over an arbitrary schema 
S to be minimal if N = Us(sol(N)). Clearly, all our complexity bounds carry over 
to this more liberal setting: the lower bounds are directly inhererited, as all instances 
in our settings are also instances of the more liberal setting, and the upper bounds are 
obtained by a trivial adaptation of the proofs of our existing upper bounds. 

An interesting problem for future research is the following. We may issue queries of 
the following form against M£(N): SELECT A SOLUTION WHERE </>. Here 4> is 
some Boolean combination on constraints on the variables of N. Queries, where <j> is a 
simple combination of range restrictions on k variables can be answered in polynomial 
time. But there are much more complicated queries that can be answered efficiently, for 
example, queries that involve aggregate functions and/or re-use of quantified variables. 
It would thus be nice and useful to identify very large classes of queries to Mt (N) for 
which a single solution - if it exists - can be found in polynomial time. 

Another relevant research problem is related to the projection 11$* (sol(N)) of the 
solution sol(N) of a (not necessarily binary) constraint network N to a user-defined 
schema S*, and to the further use of the schema S* for distributed constraint solving. 
The projection of the solution space to specific sets of variables is used in the context 
of system configuration, when a system is jointly configured by a number of engineers, 
each having access to a projection of the solution space only [33 j. The problem of com- 
puting a solution Hg* (sol(N)) is generally NP-hard, and remains NP-hard in many 
special cases, e.g. if N is binary and, at the same time S* = S2 (see Theorem |3.1| >. 
We would like to investigate relevant restrictions that make this problem tractable. For 
some restrictions, this is already known. For example, if S* has bounded hypertree 
width [ T6l[T5l [3"l. then this problem becomes tractable. If S* has bounded hinge width, 
then computing a solution can even be done in a backtrack-free manner, see Section 3 
of 1 18 1 . Note that bounded hinge width is a stronger restriction than bounded hypertree 
width; for a comparison of these and other hypergraph restrictions, see 04 1- Other de- 
compositions that lead to backtrack solution search are the world-set decompositions 
discussed in l28l and further generalized in E9l . These decompositions are based on 
Cartesian products rather than on joins, therefore, computing solutions is easier than 
with project-join decompositions. 

There is also the problem of computing the desired projections without computing 
the possibly very large relation sol(N), and, as a special case, computing the minimal 
constraint network M(N) from a given network N. More formally, we would like to 
compute IIs» (sol(N)) from N in polynomial space as efficiently as possible, assuming 
the relations of S* are all of bounded arity. There are already promising approaches 
to this problem in the literature. In lfl2l[T3l . conflict-driven answer set programming 
(ASP) techniques are used for this task. In [33 1, projections of sol(N) are computed via 
a SAT solver, and it is shown that this method is feasible for large datasets stemming 
from the automotive industry. However, we expect that a structural analysis of the 
original network N and of the desired projection schema S* could further help to speed 
up this computation. 
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Appendix A. Proof of the Symmetry Lemma 

Lemma 13.11 [Symmetry Lemma] For each fixed integer k > 1, there is a polynomial- 
time transformation T that transforms each 3SAT instance C into a k-supersymmetric 
instance C* such that C is satisfiable iff C* is satisfiable. 

Proof. We first prove the lemma for k = 2. Consider the given 3SAT instance C. Cre- 
ate for each prepositional variable p G propvar(C) a set New(p) = {pi,P2,P3,P4,P5} 
of fresh prepositional variables. Let Disj + (p) be the set of all disjunctions of three 
distinct positive atoms from New(p) and let Disj~ (p) be the set of all disjunctions of 
three distinct negative literals corresponding to atoms in New(p). Thus, for example 
(p 2 V pi V P5) G Disj + (p) and (p~i V p~4 V ps) € Disj~(p). Note that Disj + (p) 
and Disj~(p) each have exactly = 10 elements (we do not distinguish between 
syntactic variants of equivalent clauses containing the same literals). 

Consider the following transformation T, which eliminates all original literals from 
C, yielding C* : 

Function T: 

BEGIN C :=C. 

WHILE propvar(C) n propvar(C') =/= DO 

{pick any p G propvar(C) n propvar(C'); C := elim(C ,p)}; 
Output(C') 
END. 

Here elim(C ,p) is obtained from C and p as follows: 

FOR each clause K of C in which p occurs positively or negatively DO 
BEGIN 

let 6 be the disjunction of all literals in K different from p and from ~^p^\ 

if p occurs positively in K, replace K in C by the conjunction T + (K) of all clauses 
of the form a V S, where a e Disj + (p); 

if p occurs negatively in K, replace K in C by the conjunction T~ (K) of all clauses 
of the form a V S, where a £ Disj~ (p); 

END. 

Let C* — T(C) be the final result of T. C* contains no original variable from 
propvar(C). Note that C* can be computed in polynomial time from C. In fact, 
note that every clause of three literals of C gives rise to exactly ( 3 ) = 10 3 = 1000 
clauses of 9 literals each in C* . While computing C* from C, we can thus replace each 
3-literal clause of C at once and independently by the corresponding 1000 clauses. 
Similar direct replacements (but with fewer result clauses) are, of course, possible for 



An empty 5 is equal to false, and it is understood that a V false is simply a. 
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two-literal and one-literal clauses of C. Assuming appropriate data structures, the 
transformation from C to C* can thus actually be done in linear time. 

We now need to prove (1) that C* is satisfiable iff C is and (2) that C* is 2- 
supersymmetric. 

Fact 1: C* is satisfiable iff C is. We will actually prove more than we need here. In 
fact, our proof of Fact 1 below also shows that a satisfying assignment to C can be 
transformed into many satisfying assignments to C* . We will use this, when we come 
to prove supersymmetry in Fact 2. We prove Fact 1 by showing that, when at each 
step of algorithm T, C is transformed into its next value C" — elim(C ,p), then C 
and C" are satisfaction-equivalent. Fact 1 then follows by induction. Assume C is 
satisfied via a truth value assignment r'. Then let t" be any truth value assignment to 
the propositional variables of C" with the following properties: 

• For each propositional variable q of C" different from p, r"(g) = r'(q); 

• if t'(p) — true, then at least 3 of the variables in New(p) are set true by t", and 

• if r'(p) = false, then at most two of the variables in New(p) are set true by r" 
(and at least three are thus set false). 

By definition of C", t" must satisfy C". In fact, assume first r'{p) = true. Let K 
be a clause of C in which p occurs positively. Then, given that at least three variables 
in New(p) are set true by t", each element of Disj + (p) must have at least one atom 
made true by r", and thus each of the clauses of r + (K) of C" evaluates to true via 
t" . All other clauses of C" stem from clauses of C that were made true by literals 
corresponding to an atom q different from p. But, by definition of r, these literals keep 
their truth values, and hence make the clauses true. In summary, all clauses of C" 
are satisfied by t" . In a very similar way it is shown shown that t" satisfies C" if, 
t'(p) = false. 

Vice versa, assume some truth value assignment t" satisfies C". Then it is not hard to 
see that C must be satisfied by the truth value assignment t' to C defined as follows: 
If a majority (i.e. 3 or more) of the five atoms in New(p) are made true via r", then 
let t'(p) = true, otherwise let r'(p) = false; moreover, for all propositional variables 

q New(p), let t' (q) = r"(q). 

To see that r' satisfies C, consider first the case that three or more of the propositional 
variables of New(p) are assigned true by t" . Note that all clauses of C that neither 
contain p nor p are trivially satisfied by t', as r' and t" coincide on their atoms. Now 
let us consider any clause K of C in which p occurs positively. Then the only clauses 
that contain positive occurrences of elements of New(p) of C" are the sets T + (K). 
If t" is such that it makes at least three of the five atoms in New{p) true, then any 
clause in r + (_ft') is made true by atoms of New(p). Thus when replacing these atoms 
by p and assigning p true, the resulting clause K remains true. Now consider a clause 
K = p V 8 of C in which p occurs negatively. The only clauses containing negative 
New(p) -literals in C" are, by definition of C", those in r _ (_ft'). Recall we assumed 
that that r" satisfies at least three distinct atoms from New(p). Let three of these 
satisfied atoms be p i7 pj, and pk- By definition, r~(if) contains a clause of the form 
Pi V p~j V fk V 5. Given that this clause is satisfied by r", but r" falsifies pi V p~j V fk, 



24 



S is satisfied by r", and since 5 contains no 7Veu>(p)-literals, 5 is also satisfied by r'. 
Therefore, K = p V S is satisfied by r'. This concludes the case where three or more of 
the propositional variables of New(p) are assigned true by t". The case where three or 
more of the propositional variables of New(p) are assigned false by t" is completely 
symmetric, and can thus be settled in a totally similar way. 

Fact 2: Proof that C* is 2-supersymmetric. Assume C* is satisfiable by some truth 
value assignment r\. Then C is satisfiable by some truth value assignment r, and 
thus C* is satisfiable by some truth value assignment t* constructed inductively as 
described in the proof of Fact 1 . Let us have a closer look at the inductive construc- 
tion used to obtain t* in Fact 1 . For any initially fixed pair of propositional variables 
Pi, qj € propvar{C*), where 1 < i, j < 5, the construction of t* gives us a very large 
degree of freedom for chosing t* . Actually, the construction is so general, that it allows 
us to let pi, qj take on any arbitrary truth value assignment among of the four possible 
joint truth value assignments. In fact, however we choose the truth value assignments 
for two among the variables in {pi, . . . ,p 5 ,qi, . . . , q 5 }, there is always enough flexi- 
bility for assigning the remaining variables in this set some truth values that ensure that 
the majority of variables has the truth value required by the proof of Fact 1 for repre- 
senting the original truth value of p via r'. (This holds even in case p and q are one and 
the same variable, and we thus want to force two elements from {p\, . . . ,p$} to take 
on some truth values, see the second example below.) Let us give two examples that il- 
lustrate the two characteristic cases to consider. First, assume p and q are distinct and r 
satisfies p and falsifies q. We would like to construct, for example, a truth value assign- 
ment t* that falsifies p 2 and simultaneously satisfies q A . In constructing t* , the only 
requirements on New(p) and New(q) are that more than three variables from New(p) 
need to be satisfied by r*, but no more than two from New(q) need to be satisfied by 
r*. For instance, we may then set r*(pi) = r*(p 3 ) = T*{p±) = r*(p 5 ) = true and 
T *{P2) = false and T*{q\) = r*(q 2 ) = r*(q 3 ) = r*(q 5 ) = false and t*(<? 4 ) = true. 
This achieves the desired truth value assignment to p 2 and q^. An extension to a full 
satisfying truth value assignment r* for C* is guaranteed. Now, as a second example, 
assume that r(p) = false, but we would like r(pi) and t(p 2 ) to be simultaneously true 
in a truth value assignment satisfying C* . Note that in this case, the only requirement 
on New(p) in the construction of r* is that at most two atoms from New(p) must be 
assigned true. Here we have a single option only: set r*(pi) = r*(p 2 ) = true and 
T *{V3) = T *{Pi) = T *{Pb) = false. This option works perfectly, and assigns the 
desired truth values to p\ and p 2 . In summary, C* is 2-supersymmetric. 

The proof for k > 2 is totally analogous, except for the following modifications: 

• Instead of creating for each propositional variable p e propvar(C) a set 
New(p) — {pi,p 2 , . . . ,p$} of five new variables, we now create a set 
New(p) — {pi,p 2 , . . . ,p 2 k+i} of 2k + 1 new propositional variables. 

• The set Disj + (p) is now defined as the set of all disjunctions of k + 1 posi- 
tive atoms from New(p). Similarly, Disj~ (p) is now defined as the set of all 
disjunctions of k + 1 negative literals obtained by negating atoms from New(p). 

• We replace the numbers 2 and 3 by k and k + 1, respectively. 



25 



• We note that now each three-literal clause of C is replaced no longer by (3) 
clauses but by (^i 1 ) clauses. 

• We note that the resulting clause set C* is now a 3(fc + 1)-SAT instance. 

It is easy to see that the proofs of Fact 1 and Fact 2 above go through with these 
modifications. 

Finally, let us recall that any 2-supersymmetric SAT instance is trivially also 1- 
supersymmetric, which settles the theorem for k = 1. □ 



Appendix B. Proof of Theorem 4.3 



Theorem 14.31 For each fixed integer k > 3, deciding for a single bi-valued constraint 
p whether sol(Tls k (p)) = p, that is, whether p is k-decomposable, is coNF '-complete. 

Proof. It suffices to show coNP-hardness, as membership in coNP already follows 



from Theorem 4.2 We first prove coNP-hardness for the case k = 3. 

Consider a nonemptyt 3SAT instance C = {C\, . . . , C m } over a set propvar(C) = 
{pi, . . . ,p n } of propositional variables, where each Ci is a clause containing precisely 
3 literals whose corresponding atoms are mutually distinct. 

Let us first define two numbers ro and r from C, whose meaning and use will 
become clear later on. Let ro denote the number of 3-element sets {p a ,Pb,Pc} of 
mutually distinct propositional variables p a ,Pb,Pc € propvar(C) that do not all three 
jointly appear in any clause of C. Note that ro < 8(3). Let, moreover, r = 7 m + ro. 
Clearly, r is polynomially bounded in the size ofC, asr < 7to + 8(™). 

We construct in polynomial time a bi-valued constraint p of r elements, such that 
sol(Hs k (p)) 7^ p iff C is satisfiable. The scope scope(p) of p contains for each pi € 
propvar(C) a list of r + 1 variables X®, Xj, . . . , X\. Intuitively, in each tuple t of 
rel(p), for each 1 < i < n, the values assigned to the variables Xf, X\ , . . . , XI either 
shall encode a truth value assignment to pi, in which case all variables of this list will 
be assigned the same value, zero or one, or these values shall encode a tuple identifier 
for the tuple in which they occur. A tuple identifier for the s-th tuple of rel(p) assigns 
the value zero to all X\ where j < s and the value one to all X\ where j > s. This 
will be made more formal below. 

The constraint relation rel(p) consists of two groups of tuples: 

Clause-induced tuples. These are 7m tuples, namely, seven for each clause Ch, 1 < 
h < m. These tuples are numbered from 1 to 7m. Each of these tuples de- 
scribes one of the 7 legal truth value assignments (out of 8 possible) to the three 
propositional variables of a clause Ch £ C. For each clause C^, 1 < h < m, 
and each truth value assignment Tj e propvar(Ch) — > {0, 1}, among all 7 
permitted truth value assignments to the propositional variables of Ch, where 
1 < j < 7, rel(p) contains precisely one tuple t J h , whose components are de- 
scribed as follows. For each pi G propvar(Ch), ^.[Xf] — t^Xf] — t 3 h [Xf] = 
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• • • = fi[.XT] = Tj(pi). Moreover, for each pi € propvar(C) — propvar(Ch), 
t 3 h [Xf] = 0, and the assignments to X\ . . . X\ jointly constitute a unique tuple 
identifier that exclusively appears in the tuple t 3 h , and that encodes the tuple num- 
ber s of the tuple t 3 h (namely s = 7{h — 1) + j) in a very simple way: It assigns 
to all X? where < s' < s and 1 to all variables X? where s < s' < r. 

Auxiliary tuples These are no more than 8(g) tuples: one for each 3-element set 

{p a , Pb , Pc } of mutually distinct propositional variables p a ,pb,p c € propvar{C) 
that do not all three jointly appear in any clause of C. These auxiliary tuples are 
numbered from 7 m + 1 to r, where r < 7m + 8(g) is the total number of tu- 
ples in p. Essentially, the eight auxiliary tuples associated with the above sets 
{p a ,Pb,Pc} each encode one of the eight truth value assignments a lt . . . , a§ to 
the propositional variables p a , Pb, and p c . These tuples thus do not encode effec- 
tive constraints, as they reflect any arbitrary truth value assignment p a , pb, and 
p c , but they will be needed for technical reasons. More formally, for each set 
S = {p a ,Pb,Pc} as above, and each truth value assignment a to {p a ,pb,p c }, 
rel(p) contains a tuple tg, whose components are described as follows. For each 
Pl € S, t° s [X?} = t° s [X}\ = t°[X?\ = ■■■= 1%[XT] = a( Pi ). Moreover, for 
each K € propvar(C) — 5, = 0, and the assignments to X\ . . . X\, just 

as before, jointly constitute a unique tuple-identifier that exclusively appears in 
the tuple tg, and that encodes the tuple number s of the tuple t a s by assigning 
to all X? where s' < s and 1 to all variables X? where s' > s. 

This concludes the definition of p. 

CLAIM: sol(U S3 (p)) + p iff Cis satisfiable. 

We first prove the if-part of the claim. Assume C is satisfiable. Thus there exists 
a truth value assignment r to propvar(C) satisfying C. We show that sol(Hs 3 (p)) 
then must contain the tuple t £ rel(p) defined as follows. For each 1 < i < n, 
t[Xf] = t[X}\ = t[Xf] = ■■■ = t[XT] = r(pi). To see this, it suffices to observe that 
the projection t[S] of t to any set S = {X%, X%, X™} of three distinct variables from 
scope(p) is contained in the corresponding relation ILs(p) of Ilgj (p). 

In fact, if the atoms p a ,Pb and p c jointly occur in a clause Ch of C, then, the tuple 
t' in rel(p) induced by Ch for truth value assignment T[p a ,p b ,p c ] coincides in its S- 
components with the tuple t, in other terms, t'[S) — t[S]. Hence t[S] is contained in 
the relation Hs(p) of Hs 3 (p). Moreover, in casep a ,pb andp c do not jointly occur in a 
clause of C, then there must exist an auxiliary tuple t' such that t'[S] — t[S], and thus, 
again, t[S] is contained in the relation Hs(p) of II53 (p). In summary, t is contained in 
the join of the exactly ternary relations of IT53 (p). As is easily verified, the binary and 
unary relations of IIs 3 (p) are weaker than the ternary ones, and actually redundant; the 
join of all constraints with precisely three variables is in fact equal to sol(Us 3 (p)). It 
follows that t is contained in the join of Tls 3 (p), which is sol(Us 3 (p)). However, t 
is not in rel(p) because t does not contain any tuple identifier, whereas each tuple of 
rel(p) does. 

It now remains to show that, whenever sol(Hs 3 (p)) contains a tuple t £ rel(p), 
then t corresponds to a satisfying truth value assignment for C, and C is thus satisfiable. 
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Let t be such a tuple. We first show that for each 1 < i < n and each 1 < v < r and 
1 < w < r it must hold that t[X?] = t[Xf], thus all bits of t[Xf, Xj,..., XT] must 
be equal. We prove this by showing that this bit-vector cannot have two consecutive 
bits of different value. 

• Assume that for some < I < r, tlxf^ 1 ] = while t[Xf] = 1. By con- 
struction, rel(p) contains only a single tuple t' for which ^[Xf -1 ] = but 
t'[Xf) = 1, namely the tuple numbered I. Therefore, in each relation rel(c) 
of any constraint c of (p) where scope(c) contains X[~ x , X\ and any other 
variable X™, there is thus a single tuple f c having / C [X^ _1 ] = and f c [Xf] = 1. 

It follows that sol(Hs 3 (p)) contains a unique tuple whose X^ _1 -value is zero 
and whose Xf -value is one, namely the tuple t'. Therefore t = t' , which contra- 
dicts our assumption that t £ rel(p). 

• Assume that for some < I < r, tlXf- 1 } = 1 while t[Xf] = 0. Observe that, 
by construction, rel(p) does not contain a single tuple t' for which = 
1 while t'[Xf] — 0. In fact, rel(p) was carefully constructed so that the bit 
values in the sequences t'[Xf, X} 1 . . . , XI] never decrease in any of its tuples. 
Therefore, in no relation rel(c) of any constraint c of U S3 (p) where scope(c) 
contains X\~ x ,X\ and any other variable X", there is thus a tuple / having 

f\X\~ x \ = 1 and f[Xf] = 0. It follows that the join sol(Us 3 (p)) contains no 
tuple whose Xf^ 1 -value is one and whose Xf -value is zero. Contradiction. 

We have thus established that for 1 < i < n, all bits of t[Xf, Xj,...,XJ] must be 
equal. Let r be the truth value assignment that for 1 < i < n associates to each the 
truth value t[Xf] = t[X}] = ■ ■ ■ = t[XT}. Let C h be any clause of C. Let the atoms 
of Ch be p a ,Pb and p c . Define: 

X (a) := X° a if t(p„) = 1 and X (a) := X r a if r(p„) = 0; 
X (b) := X b ° if r(p 6 ) - 1 and X (b) := X r b if T (p b ) - 0; 
X {c) := X c ° if r(p c ) - 1 and X (c) := X c r , if r(p c ) = 0. 

Consider the constraint g of Il5 3 (p) having (X( ) , ^(6) , ^(c)) as scope. This con- 
straint must have a tuple t q = (r(p a ), T(p b ), r(p c )), which is obviously identical to 
t[X( a }, ^(6), -X"( c )]- There is, therefore, a tuple i' S rel(p) such that 

t'[^(o),^(6),^(c)] = (T(Pa),T(p 6 ),T(p c )). 

Given the specific values and positions of -X"( a ), X^, and X( c ) in t', it is easily seen 
that the tuple t' must belong to the group of clause-induced tuples, and more specif- 
ically, t' is induced by precisely clause Ch and truth value assignment T[p a7 pb 7 p c ]. 
To see this, let us first recall that in our our encoding of a tuple identifier the first 
bit (i.e., bit 0) is always and the last bit (i.e., bit r) is always 1, which is never 
the case for the encoding of a truth value. Now consider r(p a ). If T{p a ) = 0, then 
t(X (a) ) = t'(X (a) ) = t'(X r a ) = 0. If X {a) = X r a were part of a tuple-identifier, then 
t[-X( )], which is identical to t[X^], could never have value zero, because, bit r of a 
tuple identifier is always 1 . Therefore, X( ) must be part of a (representation of a) truth 
value assignment. Similarly, if r(p a ) = 1, then t(X^ a - ) ) = t'(X( a )) = t'(X^) = 1. 
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If X( Q ) = X° were part of a tuple-identifier, t[X( a )] could never have value one, be- 
cause all tuple identifiers have value zero at their bit position of index zero. Therefore, 
again, X( a ) must be part of a (representation of a) truth value assignment. Exactly 
the same reasoning applies to X^ and X( c y In summary, t' is a tuple of p that ex- 
actly describes truth value assignment r restricted to the three propositional variables 
p a ,Pb, and p c . Given that these propositional variables jointly occur in clause Ch, t' 
is a clause-induced tuple, and r is a "legal" truth-value assignment that satisfies Ch- 
Given that Ch was an arbitrary clause of C, r satisfies all clauses of C, and thus C is 
satisfiable. We are done for k — 3. The proof is easily modified to hold for any larger 
fixed value k. It suffices, for example, to start with fcSAT instead of 3SAT. The proof 
goes through with the obvious adjustments to the numeric parameters. □ 



Appendix C. Proof of Theorem 4.4 



Theorem 14.41 For each fixed integer k > 2, deciding for a single tri-valued constraint 
p whether sol(J\g k (p)) — P, that is, whether p is k-decomposable, is coNF '-complete. 

Proof. For all constants k, the membership in coNP of our decision problem is already 



covered by (the upper bound in) Theorem 4.2 Moreover, the coNP-hardness for k > 3 



is already proven in Theorem 4.3 as bi-valued relations are trivially also k -valued 
relations (where the additional k-2 values appear in the domains but not in the actual 
constraint relations). Thus, what remains to be done is to prove coNP-hardness for 
k = 2. 

We use a transformation from 3COL from a graph G = (V, E) as described in 



the proof of Theorem 4.2 by applying similar vectorization techniques as in the proof 



of Theorem 4.3 In particular, consider the relation p obtained from the 3-colability 



network iV, COL in the hardness part of the proof of Theorem 4.2 and let s = \p\ be 
the cardinality of p. Rather than transforming iV, COL (and thus the graph G) to p, we 
will transform it to a tri-valued constraint p* of the same cardinality s, that closely 
resembles p. To this aim, for 1 < i < n, every scope variable of iV, COL (and thus of 
p) is replaced by a block of ,s + 1 variables X® , . . . , Xf, which either encodes a color 
from {r, g, b}, or a tuple identifier. We here use the following encoding: 

• Color red is encoded as a block consisting of s + 1 consecutive positions having 
value r. 

• Color green is encoded as a block consisting of s+ 1 consecutive positions having 
value g. 

• Color blue is encoded by a leading b (as an assignment to Xf ) followed by a 
block containing s consecutive positions having value r. 

• The tuple identifier for tuple number d is a block of length s + 1 starting with a 
sequence of one or more r elements, having a b in the position corresponding to 
Xf, followed by g elements. In other terms, this tuple identifier is a sequence of 
length s + 1 of the form r, . . . ,r,b, g, . . . , g, whose d + 1st component is b. 



29 



The new relation p* thus has dom(p*) — {r, g, b} and 

scope(p*) = (X°, Xl X s ,, Xl X l 2 ,...,X°, , X° n , X*,..., X* n ). 

CLAIM: G is 3-colorable iff sol(Hs 2 (p*)) — rel(p*) is non-empty. 

The if-part is not hard to see from our construction. In fact, each correct graph 
coloring r gives rise to a tuple t in sol(Hs 2 (p*))~ re l{p*) whose vectorized component 
t[Xf, . . . , X?] representing vertex Vi consists of the encoding of the color r(u i ). 

Let us now prove the only-if part. Assume there exists a tuple t in sol (Hs 3 G *)) — 



rel (p* ) . We can show by similar arguments as in the proof of Theorem 4.3 that G must 



be 3-colorable. This is shown by the following successively derived facts: 

1. Tuple t can never have value b in an Xj -component with j ^ 0. In fact, if it 
had a b assigned to a variable Xf with £ ^ 0, this assignment would occur in a 
single tuple t' of rel (p*) only. Therefore in each relation rel(c) of any constraint 
c of Ils 2 (p*) where scope(c) contains Xf would contain a single tuple having 
Xf = b. But this means that the join of all relations IIs 2 (p* ) contains a single 
tuple having Xf = b, namely t' itself. But would imply t = t' E rel(p*) which 
is a contradiction. 

2. No pair of consecutive values of any block t[Xf, . . . , X s ], for 1 < i < n can 
coincide with rg or gr. In fact, by construction, neither rg or gr occur as con- 
secutive values in two consecutive columns labeled Xf , Xf +1 , of rel(p*), where 
I > 1. Therefore, no relation rel (c) of any constraint c of Hg 2 (p*), whose scope 
is Xf,Xf +1 , where I > 1, contains tuple rg or tuple gr. It follows that the 
join sol(Hs 2 (p*)) cannot contain any tuple having rg or tuple gr in consecutive 
components corresponding to the variables (attributes) Xf, Xf +1 , where I > 1. 
Given that t E soZ(IIs 2 (p*)), the same follows for tuple t. 

3. For each 1 < i < n, the block t[Xj , X s ] is made entirely of the same value, 
namely, either r or g. This follows immediately from the above facts 1 and 2. 

4. For 1 < i < n, each block of values t[X® , . . . , Xf] precisely encodes one of the 
colors red, green, or blue, according to our encoding scheme. To show this, it is 
sufficient to show that for 1 < i < n, if t[X}] — r then t[Xf] 6 {r, b}, and if 
t\X}] = g then t[Xf] = g. This is shown just in the same way as Fact 2 above. 
By construction of p* , the same property holds for each tuple of p* , and thus 
for all the constraints with scope {X®, Xf} of IIs 2 (p*)- Therefore, the property 
must also hold for each tuple of the join so^(IT5 2 (p*)) of IIs 2 (p*), and thus, in 
particular, for t. 

5. For each edge (v a , v b ) £ E, the blocks t[X°, . . . X*] and t[X$, . . . X§] represent 
different colors. To show this, define Xr a \ :— X s if X® = r and Xt a j := X® 
otherwise. Similarly, define X^) '■= ^ = r m< ^ -^(6) := -^b ot; h er - 
wise. Let q be the constraint of IIs 2 (p* ) with scope (q) = Clearly 
t[_X~( a ), = q[X/ a -s, Xrfy], Thus there is a tuple t' E rel(p*) such that 
t\Xi a \, Xify] = t'\Xr a }, Xn,\]. However, due to the particular value-position 
combinations, neither i'[X( a )] nor can be part of a tuple-identifier, and 
they thus jointly represent a legal coloring of the edge (v a , Vb) of G. Since this 
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is true for all edges (v a , Vb) of G, all edges of G are correctly colored by the 
coloring expressed by tuple t. 

Therefore, G is 3-colorable. This concludes the proof of the only-ifpait of our claim, 
and thus the proof of our theorem. □ 
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